179 research outputs found

    Incorporating prior information into association studies.

    Get PDF
    UnlabelledRecent technological developments in measuring genetic variation have ushered in an era of genome-wide association studies which have discovered many genes involved in human disease. Current methods to perform association studies collect genetic information and compare the frequency of variants in individuals with and without the disease. Standard approaches do not take into account any information on whether or not a given variant is likely to have an effect on the disease. We propose a novel method for computing an association statistic which takes into account prior information. Our method improves both power and resolution by 8% and 27%, respectively, over traditional methods for performing association studies when applied to simulations using the HapMap data. Advantages of our method are that it is as simple to apply to association studies as standard methods, the results of the method are interpretable as the method reports p-values, and the method is optimal in its use of prior information in regards to statistical power.AvailabilityThe method presented herein is available at http://masa.cs.ucla.edu

    Multiple testing correction in linear mixed models.

    Get PDF
    BackgroundMultiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches are applicable to LMM.ResultsWe were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach.ConclusionsWe provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data

    ForestPMPlot: A Flexible Tool for Visualizing Heterogeneity Between Studies in Meta-analysis.

    Get PDF
    Meta-analysis has become a popular tool for genetic association studies to combine different genetic studies. A key challenge in meta-analysis is heterogeneity, or the differences in effect sizes between studies. Heterogeneity complicates the interpretation of meta-analyses. In this paper, we describe ForestPMPlot, a flexible visualization tool for analyzing studies included in a meta-analysis. The main feature of the tool is visualizing the differences in the effect sizes of the studies to understand why the studies exhibit heterogeneity for a particular phenotype and locus pair under different conditions. We show the application of this tool to interpret a meta-analysis of 17 mouse studies, and to interpret a multi-tissue eQTL study

    Using genomic annotations increases statistical power to detect eGenes.

    Get PDF
    MotivationExpression quantitative trait loci (eQTLs) are genetic variants that affect gene expression. In eQTL studies, one important task is to find eGenes or genes whose expressions are associated with at least one eQTL. The standard statistical method to determine whether a gene is an eGene requires association testing at all nearby variants and the permutation test to correct for multiple testing. The standard method however does not consider genomic annotation of the variants. In practice, variants near gene transcription start sites (TSSs) or certain histone modifications are likely to regulate gene expression. In this article, we introduce a novel eGene detection method that considers this empirical evidence and thereby increases the statistical power.ResultsWe applied our method to the liver Genotype-Tissue Expression (GTEx) data using distance from TSSs, DNase hypersensitivity sites, and six histone modifications as the genomic annotations for the variants. Each of these annotations helped us detected more candidate eGenes. Distance from TSS appears to be the most important annotation; specifically, using this annotation, our method discovered 50% more candidate eGenes than the standard permutation [email protected] or [email protected]

    A theory-based practical solution to correct for sex-differential participation bias

    Get PDF
    Most genomic cohorts are retrospective where the exposures and outcomes are predetermined prior to sample collection. Therefore, a spurious association between an exposure and an outcome can arise if both variables affect study participation. Such concerns were raised in previous studies questioning the representativeness of the UK Biobank. Recently, a genome-wide association study (GWAS) on biological sex found many autosomal hits and non-negligible autosomal heritability which the authors attribute to selection bias. In this study, we propose a simple and a practical method that can overcome sex-driven selection bias based on theoretical analysis and simulations.This work was supported by the National Research Foundation of Korea (NRF) (Grant number 2022R1A2B5B02001897) funded by the Korean government, Ministry of Science, and ICT. This study was supported by the Creative-Pioneering Researchers Program funded by Seoul National University (SNU)

    Effectively Identifying eQTLs from Multiple Tissues by Combining Mixed Model and Meta-analytic Approaches

    Get PDF
    Gene expression data, in conjunction with information on genetic variants, have enabled studies to identify expression quantitative trait loci (eQTLs) or polymorphic locations in the genome that are associated with expression levels. Moreover, recent technological developments and cost decreases have further enabled studies to collect expression data in multiple tissues. One advantage of multiple tissue datasets is that studies can combine results from different tissues to identify eQTLs more accurately than examining each tissue separately. The idea of aggregating results of multiple tissues is closely related to the idea of meta-analysis which aggregates results of multiple genome-wide association studies to improve the power to detect associations. In principle, meta-analysis methods can be used to combine results from multiple tissues. However, eQTLs may have effects in only a single tissue, in all tissues, or in a subset of tissues with possibly different effect sizes. This heterogeneity in terms of effects across multiple tissues presents a key challenge to detect eQTLs. In this paper, we develop a framework that leverages two popular meta-analysis methods that address effect size heterogeneity to detect eQTLs across multiple tissues. We show by using simulations and multiple tissue data from mouse that our approach detects many eQTLs undetected by traditional eQTL methods. Additionally, our method provides an interpretation framework that accurately predicts whether an eQTL has an effect in a particular tissue

    Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies

    Get PDF
    Expression quantitative trait loci (eQTL) mapping is a tool that can systematically identify genetic variation affecting gene expression. eQTL mapping studies have shown that certain genomic locations, referred to as regulatory hotspots, may affect the expression levels of many genes. Recently, studies have shown that various confounding factors may induce spurious regulatory hotspots. Here, we introduce a novel statistical method that effectively eliminates spurious hotspots while retaining genuine hotspots. Applied to simulated and real datasets, we validate that our method achieves greater sensitivity while retaining low false discovery rates compared to previous methods

    Comparison of Two Meta-Analysis Methods: Inverse-Variance-Weighted Average and Weighted Sum of Z-Scores

    Get PDF
    The meta-analysis has become a widely used tool for many applications in bioinformatics, including genome-wide association studies. A commonly used approach for meta-analysis is the fixed effects model approach, for which there are two popular methods: the inverse variance-weighted average method and weighted sum of z-scores method. Although previous studies have shown that the two methods perform similarly, their characteristics and their relationship have not been thoroughly investigated. In this paper, we investigate the optimal characteristics of the two methods and show the connection between the two methods. We demonstrate that the each method is optimized for a unique goal, which gives us insight into the optimal weights for the weighted sum of z-scores method. We examine the connection between the two methods both analytically and empirically and show that their resulting statistics become equivalent under certain assumptions. Finally, we apply both methods to the Wellcome Trust Case Control Consortium data and demonstrate that the two methods can give distinct results in certain study designs

    Meta-Analysis Identifies Gene-by-Environment Interactions as Demonstrated in a Study of 4,965 Mice

    Get PDF
    Identifying environmentally-specific genetic effects is a key challenge in understanding the structure of complex traits. Model organisms play a crucial role in the identification of such gene-by-environment interactions, as a result of the unique ability to observe genetically similar individuals across multiple distinct environments. Many model organism studies examine the same traits but under varying environmental conditions. For example, knock-out or diet-controlled studies are often used to examine cholesterol in mice. These studies, when examined in aggregate, provide an opportunity to identify genomic loci exhibiting environmentally-dependent effects. However, the straightforward application of traditional methodologies to aggregate separate studies suffers from several problems. First, environmental conditions are often variable and do not fit the standard univariate model for interactions. Additionally, applying a multivariate model results in increased degrees of freedom and low statistical power. In this paper, we jointly analyze multiple studies with varying environmental conditions using a meta-analytic approach based on a random effects model to identify loci involved in gene-by-environment interactions. Our approach is motivated by the observation that methods for discovering gene-by-environment interactions are closely related to random effects models for meta-analysis. We show that interactions can be interpreted as heterogeneity and can be detected without utilizing the traditional uni- or multi-variate approaches for discovery of gene-by-environment interactions. We apply our new method to combine 17 mouse studies containing in aggregate 4,965 distinct animals. We identify 26 significant loci involved in High-density lipoprotein (HDL) cholesterol, many of which are consistent with previous findings. Several of these loci show significant evidence of involvement in gene-by-environment interactions. An additional advantage of our meta-analysis approach is that our combined study has significantly higher power and improved resolution compared to any single study thus explaining the large number of loci discovered in the combined study
    • …
    corecore